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A new framework to perform routing at the Autonomous System level is proposed in this paper. This mechanism, called 
Chain Routing, uses complete orders as its main topological unit. Since complete orders are acyclic digraphs that possess 
a known topology, it is possible to define an acyclic structure to route packets between a group of Autonomous Systems. 
The adoption of complete orders also allows easy identification and avoidance of persistent route oscillations, eliminates 
the possibility of developing transient loops in paths, and provides a structure that facilitates the implementation of 
traffic engineering. Moreover, by combining Chain Routing with other mechanisms that implement complete orders 
in time, we suggest that it is possible to design a new routing protocol which could be more reliable and stable than 
BGP's current implementation. Although Chain Routing will require an increase of the message overhead and greater 
coordination between network administrators, the rewards in stability and resilience should more than compensate for 
this effort. 
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1. Introduction 

The Border Gateway Protocol (BGP) has been the In- 
ternet's de facto routing protocol at the Autonomous Sys- 
tem (AS) level since its deployment in 1993. Currently 
defined in RFC 4271 pQ, BGP is a decentralized routing 
algorithm in which every router independently computes 
its best path to each destination in its routing table. When 
a group of ASs and its routers adopt a stable set of paths 
to reach a destination, it is said that the network converges 
or finds a solution. However, BGP has been deemed to be 
unstable because it is prone to develop persistent cyclic 
behavior [2], and to suffer from delays during convergence 

i2i m- 

BGP routers form a complex distributed routing sys- 
tem with a rich set of interactions. In some situations, 
these may cause an excesive number of messages which 
consequently result in delays in the convergence of a BGP 
network [51 [21 H] ■ This pathology is usually triggered by 
changes in the network's topology. Different solutions have 
been proposed to attain shorter convergence times, and the 
most recent ones [6l [7] attempt to define a temporal or- 
der to limit the undesirable effects of excesive messaging. 
Unfortunately, these mechanisms do not take into consid- 
eration the persistent route oscillations (PRO) that may 
develop in a BGP system [2J. Therefore, another indepen- 
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dent strand of research has tried to solve this specific prob- 
lem [8l |9j [lOl [TTJ [12] . However we argue that before any 
major modifications or replacements can be made to BGP, 
it is first necessary to analyze the topological structure of 
the network in which this routing protocol is employed. In 
other words, what would be the best routing framework to 
deliver information in a network with structure and topol- 
ogy similar to the Internet's? 

In an earlier publication [13j . we demonstrated that, in 
its network core, the European section of the Internet pos- 
sesses rich path diversity which BGP does not currently 
exploit. A routing protocol which could exploit the Inter- 
net's path diversity may increase this network's resilience 
to failures and allow an effective implementation of traf- 
fic engineering. Evidently, an increase in the number of 
paths available may translate in a more complex routing 
algorithm, but this could be a fair price to pay for greater 
resilience and better traffic managment, provided the over- 
heads are quantifiable and bounded. 

In this paper we address BGP's inherent instabilities 
and its inability to exploit the Internet's path diversity by 
proposing a new routing framework, which we call Chain 
Routing. This framework employs a new topological unit, 
the complete order, to define acyclic paths to a destina- 
tion. Consequently, we will try to demonstrate that Chain 
Routing could help increase the Internet's resilience to fail- 
ures and employ its path diversity. We go as far as demon- 
strating that our proposal is a feasable idea which could be 
implemented as one of the main components of a routing 
protocol; however we do not define a fully-fledged routing 
protocol. Moreover, concrete proof that this framework 
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performs better than BGP's current implementation is not 
offered here, but forms part of an ongoing research which 
we aim to pursue in the future. 

This paper is organized as follows. Section [2] provides 
the background to this research. The mathematical con- 
cepts needed to justify the framework proposed here are 
introduced in Section|3] Then Section|4]provides a descrip- 
tion of how complete orders can be employed to perform 
routing in a network and a demonstration that this model 
is feasable for implementation, via a small numerical anal- 
ysis. In Section [5] an evaluation of the potential applica- 
tion of Chain Routing to increase network stability and 
its implementation costs are presented. Section [6] demon- 
strates how complete orders in time might be also neeeded 
to enhance the stability of a network. Finally, Section [7] 
discusses the advantages and disadvantages of using this 
framework before arriving at conclusions in Section [HJ 

2. Background 

When a BGP router announces a destination, it also 
announces the path that is used to reach back this desti- 
nation. BGP uses Classless Interdomain Routing (CIDR) 
prefixes as destinations and the list of ASs that have passed 
the announcement as paths. Therefore, this protocol has 
been classified as a path-vector routing protocol. 

BGP selects only one path to reach back each destina- 
tion in its routing table pQ. We name this BGP property 
the Preferred Paths Rule (PPR). It is important to 
notice that this restriction is a characteristic of the BGP 
protocol and not of the relationship that exists between 
ASs in the Internet. Proof of this assertion is the fact that 
many network administrators try to override the PPR in 
order to employ more than one route to reach a destination 
and thus implement traffic engineering [13] , It also illus- 
trates that BGP was not originally designed to support 
traffic engineering. 

Another important BGP characteristic is that the selec- 
tion and announcement of a preferred path can be modi- 
fied by the network administrator. These adjustments are 
needed in order to accomodate the comercial agreements 
made by the owners of each AS. This BGP property is 
known as policy configuration, or just policies, and it plays 
an important role in the functionality of any routing pro- 
tocol at the AS level. 

The Internet's topology is usually modelled as an undi- 
rected graph, G — (V,E), in which ASs are represented 
by the vertex set V, and their communication links by the 
edges E that join them. Furthermore, in [13] we proposed 
to use digraphs, D = (V,A), to represent how the an- 
nouncement of destinations is restricted by policies. Con- 
sequently, directed edges or arcs A would be used to model 
how the policies, implemented at ASs, shape the propaga- 
tion of destinations in a network. This graph was called 
the announcement digraph of destination i D anc (i), and 
its converse, the destination digraph of i Dd s t(i), could 



represent how ASs may use different paths to reach back 
the announced destination. 

On the other hand, the restrictions imposed by the PPR 
mean that, after selecting its preferred path, an AS can 
only announce one route per destination to its neighbor- 
ing ASs. Therefore, destinations in the Internet are an- 
nounced following an oriented tree or arborescence, which 
we call the BGP digraph of destination i Dggpfi). Never- 
theless, the BGP digraph does not show how destinations 
could propagate through the Internet if only the restric- 
tions imposed by policies are taken in consideration (i.e., 
BGP's PPR is ignored). 

The graph-theoretic representations we propose above 
allow a better understanding of how information flows and 
how instabilities originate and cause problems in the In- 
ternet. We have divided the BGP instabilities observed by 
other authors in two different categories: persistent route 
oscillations (PRO) and convergence delays. 

2.1. Persistent route oscillations 

An early study on the instabilities of the Internet [2] 
uncovered that BGP could develop PRO when a group of 
ASs cannot find a unique solution to reach a destination 
due to conflicting BGP policies. This causes BGP routers 
to fall in a state where they alternate between different 
paths to a destination in an endless cyclic behavior. 

Further analysis of this pathology [5J [5] demonstrated 
that, in order to avoid the development of this problem, it 
is necessary to eliminate the cyclic relationships that may 
exist between the ASs involved in the selection process. 
To achieve this objective a mathematical model called a 
dispute wheel, was proposed by Griffin et al. [8j [9] , who 
demonstrated that when dispute wheels do not develop, 
the network is free of PRO. An alternative way to formu- 
late the dispute wheel model is that it tries to avoid the 
development of cycles in the announcement digraph of a 
destination 

A different approach to avoid PRO was obtained 
through a set of guidelines which restrict the paths an AS 
can use to reach its neighbors depending on the commercial 
relationship between them jTOL Such guidelines reinforce 
the Internet's hierarchical structure and eliminate some of 
the potentially problematic alternative paths. Closer in- 
spection of this model shows that it employs policies to re- 
strict the network's path diversity and to adopt a directed 
tree when announcing and reaching back destinations. 

Recent solutions to the PRO problem [TTJ [12] propose 
to use different metrics to determine when a BGP sys- 
tem is oscillating and stop this behavior. Unfortunately, a 
common requisite of all the mechanisms described in this 
section is that BGP will always enforce its PPR to find a 
unique best path for each destination, which in turn fails 
to exploit the path diversity available at the core of the In- 
ternet [13]. We believe this is a missed opportunity since 
path diversity could be employed to increase the capacity 
and resilience of the network. 
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2.2. Convergence delays 

BGP's slow convergence is said to be caused by the ex- 
cessive message exchange that sometimes develops after 
the network's topology has changed, while the error mes- 
sages propagate through the network and until a new so- 
lution is reached; some authors have called this transient 
state path exploration [6]. There have also been propos- 
als to speed up the convergence of BGP networks [HI [16] . 
The reason why so many resources have been devoted to 
study this problem is because, when the network is in a 
state of path exploration, it is more vulnerable to develop 
transient loops and these could in turn, cause packets to 
be dropped. 

Two of the most recent solutions to this problem 
|S] advocate using timestamps which effectively implement 
an absolute temporal order of the control messages that 
are produced in a network. Later, in Section [6j we will 
return to this mechanisms because we believe that time is 
an important constraint in the correct functionality of a 
routing protocol. 

The previous solutions to the convergence delays experi- 
enced in the Internet assume that the network will finally 
converge, but they fail to consider what could happen if the 
system develops PRO. As other researchers have demon- 



strated (Section 2.1), in order to avoid PRO it is imper- 



ative to eliminate the directed cycles present in the an- 
nouncement and destination digraphs. The literature on 
the field of acyclic digraphs [17l [18] proves that the max- 
imal acyclic digraph is the complete order. The following 
section provides the mathematical background needed to 
use topological complete orders in a graph. Later we will 
also discuss and apply complete orders in time. 



3. Complete Orders 

In contrast to (undirected) edges, arcs possess a direc- 
tion, this means that an arc labeled ab has a tail (a) and a 
head (b) which represent the direction of the arc. An arc 
with the same head and tail (aa) is called a loop. And a 
cycle is a closed directed path of two or more arcs. 

Any digraph with no cycles is called an acyclic digraph. 
A partial order is an acyclic digraph in which the vertices 
possess the following three properties: irreflexive (there 
are no loops), asymmetric (if arc ab exists, then arc ba 
cannot exist) and transitive (if arc ab and be exist, then 
arc ac must exist). When a partial order is also complete, 
that is, all the previous properties apply to all the vertices 
in the digraph, then it is called a complete order, total 
order or linear order. Examples of both types of orders 
are provided in Fig. [T] It is known that a complete order 
with n vertices has n(n — l)/2 arcs |17j . 

Two digraphs are isomorphic when there is a one-to- 
one correspondence between their vertices and their arcs. 
Because of their completeness, all complete orders with the 
same number of vertices are isomorphic. It is also said that 
complete orders are maximal, because adding a new arc to 
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Figure 1: A partial order (A) and its Hasse diagram (B); a complete 
order (C) and its Hasse diagram (D) 



this digraph forms a cycle. Therefore, a complete order is 
the maximal acyclic digraph that can be formed given a 
set of vertices. Another important property of complete 
orders is: 

Theorem 1 (from [ 1 TJ ) . Every complete order has a 
unique transmitter and a unique receiver. 

This means that the subgraph of a destination digraph 
with the largest possible number of arcs, which is still 
acyclic and thus provides the greatest path diversity be- 
tween a source and a destination, is a complete order. 

A digraph can also be described as a binary relation on 
its vertex set, where each of the existing arcs ab in the di- 
graph is equivalent to the binary relation aRb. Therefore, 
properties for binary relations can be applied to digraphs 
and vice versa. 

Besides digraphs, another common representation of 
partial orders is the Hasse diagram: To construct this from 
a digraph D, first draw the vertices of D in vertical order 
such that a is below b if aRb, then draw all the digraph's 
arcs (which should have an upward direction if the order 
was done correctly), delete all arcs that could be implied 
by the transitive property, and finally delete the direction 
indicators in the remaining arcs. Examples of Hasse di- 
agrams for a partial and a complete order are shown in 
Fig. □ 

A common mathematical notation of a partial order 
on a set S is (S, -<) [12] . A complete ordered subset of 
a partial order is also called a chain. For example, in 
Fig. [ijB) vertices a, b, c and d form a chain, denoted 
here as C{a, b, c, d). The height H(S, -<) of a partial order 
(S, -<) is one less than the number of vertices in a maxi- 
mum length chain in (S,<). Therefore, the partial order 
in Fig. [iJB) has height 3, because both maximum length 
chains, C(a,b, c, d) and C(e,b,c,d), have 4 vertices; and 
the complete order's height is 4. 
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In a previous publication [TJ] we proposed to use the 
number of arc-disjoint paths as a metric of path diver- 
sity between a source and a destination. A group of arc- 
disjoint paths is a set of paths connecting two vertices 
through intermediate vertices, in which none of the paths 
traverse the same arc more than once. Notice that arc- 
disjoint paths can visit the same vertex more than once, 
as long as different arcs are used to reach and leave each 
vertex. A group of arc-disjoint paths is a resilient strategy 
to send information from a source to a destination because 
paths do not share arcs or communication links. The num- 
ber of arc-disjoint paths in a complete order is defined by 
the following theorem: 

Theorem 2. If C is a complete order with n > 2 vertices, 
then there are n—1 arc-disjoint paths from the transmitter 
to the receiver. 

Proof. This is proven by induction: 

The smallest possible 2-vertex complete order has only 
1 path: the arc v±V2- Now assume that a complete order 
with n vertices has n—1 arc-disjoint paths between the 
transmitter v\ and the receiver v n . Then, the transitive 
property implies that the complete order with n+1 vertices 
also has n—1 arc-disjoint paths between the transmitter 
v\ and its receiver v n+ \. In addition to these, the arc 
V\v n+ i provides an additional arc-disjoint path, because 
of the completeness property. This results in a total of n 
arc-disjoint paths from V\ to u n +i- 

Notice that the number of arc-disjoint paths and the 
height of a complete order H(C, -<) are the same. Theorem 
[2] also demonstrates that the smallest complete order that 
offers any path diversity has 3 vertices, and thus, just 2 arc- 
disjoint paths. Still, it is necessary to determine how many 
arcs does the n—1 arc-disjoint paths from the transmiter 
to the receiver will use (it) in a complete order, and how 
many arcs will remain unused (r) by these paths: 

Theorem 3. If C is a complete order with n > 2 vertices, 
then all of the n—1 arc-disjoint paths from the transmitter 
to the receiver use exactly u — 2n — 3 arcs. 

Proof. This is also proven by induction: 

The smallest possible 2-vertex complete order uses only 
1 arc for its only path: the arc v±V2- Now assume that 
a complete order with n vertices uses 2n — 3 arcs on its 
n—1 arc-disjoint paths between the transmitter v\ and 
the receiver v n . Most of these arcs follow a path from the 
transmitter to an intermediate node and then to the re- 
ceiver, and there is just 1 direct path from the transmiter 
to the receiver. Then, because of its transitivity, the com- 
plete order with n + 1 vertices must also use 2n — 3 arcs 
on its arc-disjoint paths between the transmitter v\ and 
the receiver v n+ i plus the arc v n v n +\ and, because of its 
completeness, the arc v\v n+ i. This gives a total of: 

u = 2n - 3 + 1 + 1 = 2(n + 1) - 3. 



Corollary 4. If C is a complete order with n > 2 vertices, 
there are exactly r = (n — 2)(n — 3)/2 arcs that are not 
used in all the n—1 arc-disjoint paths from the transmiter 
to the receiver. 

Proof. This result is easily obtained by substracting the 
number of arcs used by the n—1 arc-disjoint paths (The- 
orem [3| from the complete order's total number of arcs: 
n(n- l)/2. 

Notice that r increases quadratically with respect to the 
number of vertices in the complete order (n) , while u only 
does so linearly. By closer inspection it is possible to see 
that when n = 3, u — 3 and r = 0, but because r in- 
creases quadratically, when n = 8, u = 13 and r = 15. 
This means that when n > 7, r becomes larger than u. 
This may translate into ASs and their routers spending 
more resources to store and manage the r stand-by arcs, 
instead of the u arcs which form the primary arc-disjoint 
paths. Therefore, in practice, complete orders that grow 
beyond a maximum size of 7 vertices could be too costly 
to be considered a sensible solution for a communication 
network's routing needs. 

Another important characteristic of complete orders is 
that they offer flexibility and predictability when chang- 
ing their height (i.e., number of vertices). The following 
theorem demonstrates how easy it is to reduce the height 
of a complete order: 

Theorem 5 (from [ 1 TJ ) . If C is a complete order with 
at least 3 vertices, and if v is any vertex of C , then C — v 
is also a complete order. 

This previous theorem demonstrates that, when a chain 
of n vertices needs to eliminate a vertex, it is possible 
to end with the chain of n — 1 vertices. Conversely, the 
next theorem and corollary analyze how many arcs will be 
needed to increase the height of a complete order: 

Theorem 6. If C is a complete order with n > 2 vertices, 
then the augmented complete order C+l which has exactly 
1 more vertex than C requires n new arcs. 

_. „, n(n— 1) (n+l)n 

PROOF. C has — - arcs. So C + l has - — arcs. 

Hence the difference in the number of arcs between these 
complete orders is: 

n 2 + n n 2 — n 



Corollary 7. Increasing the number of vertices and the 
height of a complete order by 1 requires at least 2 new 
arcs. 

Proof. Since the smallest possible complete order has 2 
vertices, then 2 new arcs are needed to augment to the 
complete order of 3 vertices. 

Theorem [6] and Corollary [7] demonstrate why a single 
arc is not enough to increase the height of a chain. 
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4. The Chain Routing framework 

We propose Chain Routing as a routing framework that 
employs complete orders (or chains) as the basis for de- 
termining a set of valid routes to a destination. Such a 
framework has two main advantages: 



1. 



It uses the maximum number of acyclic directed paths 
between two nodes (vertices) as its natural unit of 
path diversity. 

It requires a simple data structure to store several 



paths (Section 4.1 1 



The main objective of Chain Routing is to define a chain 
between the source s and a destination d which includes 
as many intermediate vertices as possible, provided the 
chain's maximum allowed size of 7 vertices is maintained. 
Vertices included in this set, other than s and d, are called 
the intermediate nodes from s to d. Because of the com- 
pleteness and maximality properties of complete orders, a 
chain can be used to represent a self-contained strategy to 
reach d, even when some of the intermediate nodes or links 
fail. This means that after a chain has been defined, all 
the ASs in the chain will transmit information following 
the same paths described by the topology of the complete 
order. 

Since Chain Routing can be thought either as a replace- 
ment or as an enhancement to BGP, it can operate as a 
decentralized routing algorithm. This means that each AS 
must learn the network's topology through the announce- 
ments received from its neighbors, and also that each AS 
will need to define its own set of chains to different des- 
tinations. However, contrary to BGP, more coordination 
between ASs is needed to define the chains and to set the 
order of its intermediate nodes. The following example 
shows how some of the properties of chains could apply to 
destination digraphs: 

Example 1. Assume Fig. ^^C) represents a destination 
digraph for AS d, D<ist{d). There are many routes e may 
choose to send packets to d, but the following 4 arc-disjoint 
paths provide the most resilient strategy: 

1. e d 

2. e -» a — > d 

3. e^b^ d 

4. e -» c — > d 

There are also 4 not arc-disjoint paths that e could use 
to reach d: 

1. e — >• a — >• b -> d 

2. e — > a — > c — > d 

3. e — > b — > c— > d 

4. e^a^b^c^d 

When e follows the proposed strategy and uses arc- 
disjoint paths, it could balance the traffic load between each 
of these paths, or it could prefer to use the direct path 



(e — > d) and leave the other arc-disjoint paths as backup. 
If e picks the latter option, and later link ed fails, e has 
still 3 safe alternative paths to route to d. 

Regardless of the individual computations that each AS 
will need perform in order to determine its preferred chain 
to a destinaiton, each chain will still need coordination be- 
tween the source and the intermediate nodes in order to 
avoid instabilities in this system. For example, if e dis- 
tributes its traffic between the 4 arc-disjoint paths, each of 
the intermediate nodes will need to know the strategy fol- 
lowed by e, otherwise a could use a non arc-disjoint path, 
like e — > a — > b — > d, which may cause congestion with path 
e^b—^d. 

Notice that none of the paths in the previous example 
can form a cycle, even if two or more paths are combined. 
This is because the original destination digraph is acyclic. 
Therefore it is possible to guarantee that, regardless of 
which vertex or arc fails, cyclic paths will not develop in 
this network. In other words, when an arc or a vertex (with 
its adjacent arcs) is removed from an acyclic digraph, the 
result is still an acyclic digraph. 

It is our intention that, order of the ASs in a chain 
determines how information packets flow in the network, 
but not how control messages are exchanged between ASs. 
In other words, ASs could be allowed to break the chain 
order to quickly communicate a change in the network's 
topology or the occurrence of a failure which causes loss 
of connectivity. 

4-1. Chain Routing data structure 

The main objective of this structure is to keep a corre- 
spondence between the network's topology and the chains 
used to route information through it. In order to achive 
this, the Chain Routing data structure will store three dif- 
ferent basic structures, and two of these may also con- 
tain (or point) to other basic structures. We will also use 
the concept of levels of abstraction to represent that 
a structure may recursively contain other structures. Ini- 
tially structures will be stored at level 0, and these may 
contain other structures at level 1, which may also contain 
other structures at level 2, and this continues until all the 
structures of the topology have been recorded. The three 
basic structures we propose for Chain Routing are: 

arc The most basic structure only records a link between 
two ASs and it cannot contain other structures. 

Varc It describes a path which, in terms of the data struc- 
ture, is just a sequence of arcs. A Varc usually con- 
tains arcs at the next level of abstraction. 

chain A chain or complete order may contain any of the 
three basic structures at the next level of abstraction; 
henceforth the arcs of a chain will be called segments 
to denote that they may be of different type to the 
basic arc structure. 
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The Varc structure mentioned before records ASs which 
do not possess enough connectivity to form a chain, but 
that still allow transmitting packets through them and 
hence, could be considered as a sequence of arcs that follow 
a predefined path: 

Definition 1. A Varc or virtual arc is a structure that 
represents a set of vertices and their adjacent arcs that 
follow a directed path from an initial vertex x to a final 
vertex y, where the directed path needs to be abstracted in 
order to allow x and y to be the end vertices of a chain 
segment. 

A new link must be initially recorded in the data struc- 
ture as an arc at level 0, but it is necessary to consider 
that this new arc could also: 

1. allow defining a new or a longer chain. 

2. help to combine two chains, with common vertices, 
into a longer one. 

3. be included as part of a Varc. 

From the three previous options, arcs that can be in- 
cluded as part of a Varc should have the least preference, 
because Varcs do not increase the path diversity of the 
network. The other two options should only be preferred 
depending on which one will form the chain with greater 
height. 

Now we provide an example of how the Chain Routing 
data structure could be employed to describe the topology 
of a network: 

Example 2. The network depicted in Fig. [1| shows a 10- 
vertex destination digraph which has been represented or 
abstracted using the ^-vertex chain Ci(s,e,b,d), which 
must be stored at level in the Chain Routing data struc- 
ture. Such a chain has 6 segments that need to be recorded 
at level 1: 

1. Segment se is an arc. 

2. Segment sb is a Varc formed of arcs sa and ab. 

3. Segment sd is a Varc formed of arcs sc and cd. 

4. Segment eb is a Varc formed of arcs ef and fb. 

5. Segment ed is a Varc fromed of segment eg and arc 
gd. 

6. Segment bd is an arc. 

The arcs and segments of each Varc are recorded at the 
next level of abstraction (2). Segment eg of Varc ed is 
abstracted as chain Ci(e, h,g), and its segments are stored 
at level 3: 

1. Segment eh is an arc. 

2. Segment eg is a Varc formed of arcs ef and fg. 

3. Segment hg is a chain Cs(h,i, g). 

Finally, each of the arcs of Varc eg and the segments of 
C3 are stored at level 4 of the data structure. 






C, 



< ( b Varc(ed) = eg + < 

Varc(eb) = ef+fb Varc(eg) = ef+ fg 

Varc(sd) = sc + cd 
4 s Varc(sb) = sa + ab 

Figure 2: Chain Routing example 



The previous example demonstrates how, by nesting 
chains and Varcs, it is possible to abstract many vertices 
and arcs in a network. The main chain in this example, 
Ci, uses e and b as its intermediate nodes and it only has 
a height of 3, which shows that the number of vertices 
in a chain may be significantly less than the number of 
nodes in its destination digraph. Nesting structures and 
virtual arcs allow to condense the network's topology and, 
more importantly, to focus attention on the ASs which are 
central to the path diversity and resilience of the network. 

In order to define the structures shown in Fig. |2j a great 
degree of coordination between the ASs in this network is 
needed. For example, AS e has four valid options for an 
intermediate node to d: b, f, g and h; but only b allows 
the definition of C\ as depicted. Therefore, s would need 
to coordinate with e in order to obtain the solution shown 
in this figure. 

Notice that the previous example is not the only solu- 
tion which could be used to represent the network in Fig. 
fusing chains, but this paper does not try to provide the 
final and full implementation of the Chain Routing frame- 
work, just to prove that complete orders could be used as 
a safe method to perform routing. 

4-2. A naive implementation of Chain Routing 

To demonstrate the applicability of our idea to a di- 
rected graph and that is feasible to use complete orders 
to represent the path diversity of the Internet, we im- 
plemented a computational program capable of finding as 
many chains as possible in the 45 announcement digraphs 
of European ASs that we obtained in a previous analysis 
of the Internet [T31 |2"U] . 
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Algorithm 1: Modified BFS algorithm 
Input: The adjacency matrix of D anc (i) = (V,A) 

foreach v E V do 

predecessor^) <— nil; 
distancc(v) <— nil; 
end 

distance(z) 4— 0; 
QUEUE <- i; 

while QUEUE ^ do 

x <— head of QUEUE (also delete x from 
QUEUE); 

foreach out-neighbour (y) of x do 

if distance(y) = nil then 

distance(y) <— distance(x) + 1; 
predecessor (y) <h- x; 
tail of QUEUE <- y; 

if (y is the only out-neighbour of x) AND 

(x ^ i) then 

Add a Varc to the data structure using 
arcs xy and the one formed by 
predecessor (x) and x; 

else 

Add arc xy to the data structure; 
end 
else 

if predecessor (y) = nil then 

Arc xy is just a cycle back to the origin, 

ignore it; 
else 

Add a chain to the data structure using 
(transitive) arc xy; 
end 
end 
end 
end 



We presume that the simplest way to identify chains 
in a digraph is by using the transitive relationship that 
exists between its vertices. Therefore, a vertex that can 
be reached through more than one path must posses a 
transitive relationship with at least one other vertex in 
this graph. The program we developed is based in the 
Breadth-First Search (BFS) algorithm [18:. We decided 
to use BFS because it discovers first the direct paths from 
a source vertex, and later the paths that use intermediate 
nodes. We include here our modified BFS algorithm under 
Algorithm [I] The input to this program is the adjacency 
matrix of an announcement digraph. 

As illustrated in Algorithm [I] when a vertex is visited 
for the first time (distance(y) = nil) it is added to the data 
structure as either an arc, or as part of a Varc. Conversely, 
vertices that have been visited before may be included in 



a chain and need to be further processed to determine the 
length and vertices that could form such chain. The final 
objective of our program is to create and store the topolog- 
ical information of the announcement digraph in a Chain 
Routing data structure. Algorithm [T] was implemented in 
a C++ program, which builds a Chain Routing data struc- 
ture by finding most of the available chains and paths from 
vertex i to the other 44 destinations (vertices). The spe- 
cific functions that process the chains and implement the 
database are too large to be included in this paper. 

The 45 announcement digraphs of the European ASs 
included in our previous study [5D] were processed and 
analyzed using the modified BFS algorithm. Then the 
longest possible chain or structure to each of the other 44 
announced ASs was recorded in a large table (45 x 45) 
which is available in [21] . Table [I] shows the results for 
only the first 10 ASs in original original list of ASs. A 
description of what the entries in this table mean follows: 

number This is the height of the longest chain between 
the source AS and the announced AS. When this num- 
ber is 1, it indicates that there is only one arc between 
the ASs, but the announced AS is part of a longer 
chain. This means that, although there is no path di- 
versity to this AS, it is still crucial to the connectivity 
of other ASs. 

A This entry means that there is just an arc between the 
source AS and the announced AS. There is no further 
path diversity available. 

B This means that there is a bridge between the source AS 
and the announced AS. This means that, although a 
chain is present in the path, at some point only an arc 
separates the two ASs. Therefore, a bridge indicates 
that the connectivity between the ASs is limited. 

Table [2] shows the number of arc-disjoint paths between 
the same set of ASs, but using the results obtained in 
[2"U] . By comparing Tables [I] and [2j it is possible to see 
that there is a strong relation between the number of arc- 
disjoint paths found and the height of the chains obtained 
by our program. This indicates that our basic implemen- 
tation of the Chain Routing framework is efficient at ex- 
ploiting the path diversity of a digraph. 

Since the height of a chain and the number of arc- 
disjoint paths are equal, the results in Table [l] show that 
in most cases there are 2 arc-disjoint paths to reach a des- 
tination, and that sometimes there is enough connectivity 
to build 3, or even 4 arc-disjoint paths to a destination, 
such as between AS3303 and AS8220. This demonstrates 
that Chain Routing can be employed to find and use al- 
ternative paths which may increase the resilience between 
a source and a destination. 

On the other hand, we also noticed that some chains 
conflicted with each other. An example of such a conflcit 
would be Ci(a, b,c) and Ci(a,c, b). This situation implies 
that, if Chain Routing is going to exploit the connectivity 
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Table 1: Longest possible chain or structure from an originating AS to other ASs 
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between a, 6, and c, it will need to select and use only one 
of C\ or C2 , otherwise cyclic behavior could arise between 
these two structures. It also means that we may need to 
use a chain that is convenient as a general routing strat- 
egy, even if this is not the best solution for a particular 
destination. 

The complete table, available in [3T], shows that some 
AS can only use bridges (B) and arcs (A) to reach the 
other 44 destinations. These ASs mostly rely on a better- 
connected AS to route to the rest of the network. This is 
probably what ASs without many communication links ex- 
perience in the Internet. It also implies that ASs with lim- 
ited connectivity will enjoy fewer benefits from the Chain 
Routing framework. 

Fig. [3] shows the frequency of each chain- height recorded 
in the complete table [21] . Since chains of height 1 and arcs 
(A) are similar, they are both counted under the same col- 
umn (height = 1) where arcs appear in darker color. There 
were only 2 ocurrences of chains of height 4, and the most 
frequent chain, with a height of 2, had a count of 670. 
This figure does not show a number of entries which pro- 
duced invalid results due to our program failing to process 
some announcement digraphs. These failures were caused 
by functionality that was still under development. 

The results obtained from this analysis and the fact that 
the Chain Routing implementation used here is neither 
optimized nor complete, call for further development of 
software that can automatically search and define chains 
in the Internet. Such enhanced algorithm may implement 
either post-processes to combine chains and nested struc- 
tures that were not initially discovered by our BFS algo- 
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Figure 3: Frequency of chain height found 

rithm, or functions specialized on discovering chains, in- 
stead of nodes. 



5. Applying Chain Routing 

This section has two objectives. First, it analyzes how 
Chain Routing would increase network stability by demon- 
strating how this framework could help solve or amelio- 
rate the effects of three of the most documented Internet 



pathologies (5.1). Then, it considers the cost of fully im- 



plementing and employing Chain Routing (5.2 1 in a net- 
work. 



5.1. Chain Routing and Internet stability 
5.1.1. Persistent Route Oscillations 

Although PRO [5] develop in networks that have an- 
nouncement digraphs with cycles, this pathology also has 
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Table 3: Path preference for the cannonical PRO network 



AS 


Paths to d 


Preference 
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c — > a — > d 
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c — > d 
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a temporal aspect caused by the path selection process 
which is continously executing in BGP. This means that 
in order to avoid cyclic behaviors within a chain, it is nec- 
essary to consider the dynamics of this system. Therefore, 
we propose a pair of mechanisms which avoid the devel- 
opment of PRO in a chain. These have been converted 
into rules which can be implemented in a Chain Routing 
system: 

Rule 1. Before accepting to become part of a chain, every 
AS needs to verify that the proposed chain does not create 
a cycle with any other basic structure already defined in 
their own data structure. 

This is similar to BGP's current functionality where ASs 
are constantly monitoring that cycles do not develop in 
their paths. To demonstrate how Rule [T] stops PROs from 
developing, we will use the cannonical PRO example first 
introduced by Varadhan et al. [5] and depicted in Fig. [4j in 
which ASs a, b and c have different options to reach AS d, 
but all of them prefer to use their longer path through 
the next neighbor over their shorter direct path. This 
preference is described in Table [3] and it causes an endless 
sequence in which every AS prefers the path through their 
counter clockwise neighbor over their own shortest path. 
To demonstrate that this cyclic behavior will not develop 
if Rule [l] is applied, we provide the following example: 

Example 3. Suppose that C a (a, 6, d) is the first chain de- 
fined in the network depicted in Fig. [JJ This is a perfectly 
valid chain irrespective of which route AS a uses to reach 
d (either the direct ad route or the one through b). Now 
suppose that c tries to use a to reach d, but because there 
is only one path from c to b (ca + ab), it will define chain 
C c (c,a,d) where the segment ad is actually C a . Finally, 
AS b tries to use c to reach d, but when b requests to create 
chain Cb (b,c,d), c will apply Rule^ and realize that b is 
already part of C a ( and C c ) and will reject creating Cb ■ 
Thus the PRO has been avoided. 

The second rule to avoid PRO in a chain is: 

Rule 2. When a segment in a chain becomes unavailable 
and an alternative path needs to be used, it is safer to 
select paths that, because of the chain's topology, cannot 
route information through the unavailable segment. 



a 




b c 



Figure 4: The cannonical PRO network by Varadhan et al. 



Table 4: Path preference for the PRO network by Griffin et al. 
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Figure 5: Another PRO case by Griffin et al. 

Once it has been confirmed that paths, which are adja- 
cent to the failed segment, can still reach the destination, 
they may be reinstated as safe paths. 

This type of behavior helps the network to quickly reach 
a stable state because alternative paths that have a proba- 
bility of failure are not used. An example of how to apply 
this rule is illustrated in Fig. [5] which was originally in- 
troduced by Griffin et al. [2"2] . The route preference for 
this network is described in Table |4j where X represents 
the fact that the originating AS's policy only requires that 
this alternative path sends information through its counter 
clockwise neighbor and finishes in d. Initially, ASs c, e and 
/ will all send information to d through AS b, but when 
link bd fails, c, e and / will prefer to use their counter clock- 
wise neighbor instead of the most direct route through a. 
This produces the same cyclic system and behavior de- 
scribed in Fig. [4j The following example analyzes what 
happens when Chain Routing and Rule [2] are applied: 
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Example 4. In the network depicted in Fig. [3| a Chain 
Routing system could define the following three chains: 

1. C\ (c, /, b, d) with Varcs ca + ad and fa + ad. 

2. C*2 (e, c, b, d) with Varcs ea + ad and ca + ad. 
3- C3 (/, e, &, d) with Varcs fa + ad and ea + ad. 

So when link bd fails, ASs c, e and f will comply with 
Rule^ and select paths that, because of the chain's topology, 
cannot route information through the faulty bd segment. In 
this case, the direct path of each chain would be the only 
route that agrees with Rule [1| For example, AS c would 
select the segment cd (Varc ca + ad) which cannot route 
information through segment bd. 

By assuring that no cycles develop when the topology 
of a network changes, these rules provide Chain Routing 
with enough robustness to avoid PROs in a network. 

5.1.2. Delayed network convergence 



We explained in Section 2.2 that path exploration in a 
BGP system may cause transient loops and dropped pack- 
ets. Chain Routing uses complete orders as its topological 
unit, which are acyclic digraphs. Therefore, it is possible 
to guarantee that transient loops will not develop. Con- 
versely, because Chain Routing does not control the dy- 
namics of the system, it cannot assure that the network 
will reach faster convergence times, nor that information 
packets will not get lost while the network is in its tran- 
sient state. However, in Section|6]we will illustrate that by 
combining complete orders in time and topology it may be 
possible to shorten the duration of transient instabilities 
in the Internet. 

5.1.3. Network congestion 

Chain Routing provides a framework which can be em- 
ployed to perform traffic engineering between all its arc- 
disjoint paths. In general, by applying Theorem[2j a chain 
of n vertices could use n— 1 arc-disjoint paths to distribute 
traffic between the source and destination. Therefore, al- 
though Chain Routing cannot directly eliminate conges- 
tion in a network, it allows to implement better traffic 
administration mechanisms to avoid this problem. 

In conclusion, the Chain Routing framework may help to 
increase network resilience at the same time that exploits 
the Internet's path diversity, but it cannot solve all the 
instabilities observed in the Internet. 

5.2. The cost of implementing Chain Routing 

Chain Routing propagates its routing information via 
announcement digraphs, which possess more arcs and bet- 
ter connectivity than the BGP digraph [T3]. Unfortu- 
nately, the increased connectivity of the announcement di- 
graph also requires more control messages to define chains. 
These extra-messages depend on the policies that each AS 
applies to its neighbors; therefore it is not possible to ac- 
curately predict the exact number needed to define such 



chains in a network. Besides the additional messages pro- 
duced by using the announcement digraph, it is necessary 
to consider that coordination messages will be needed be- 
tween the ASs that form chains in order to mantain these 
structures. Although the details of the messages needed to 
establish such chains are outside the scope of this article, 
a tentative solution would need: 

1. A message from the source to every intermediate (n — 
2) node requesting to establish a chain. 

2. A reply message from every intermediate node accept- 
ing or rejecting to be part of the chain. 

3. Another message from the source to the intermedi- 
ate nodes that have accepted to be part of the chain 
confirming that the chain has been implemented. 

This means that every chain of length n may need up 
to 3(n — 2) extra-messages to establish each chain. In 
practice, there may be conflicts when deciding who would 
be the source and the order of the intermediate vertices, 
and this may cause the number of messages to increase 
while a suitable chain is defined. 

It is also necessary to consider the amount of resources 
required to store and manage the data structure that will 
record chains. Although this data structure will contain a 
list of chains to different destinations, more than one des- 
tination could be recorded in each chain, therefore Chain 
Routing should employ a smaller number of chains than 
the number of destinations in the network. Since this pa- 
per only provides an outline of the Chain Routing frame- 
work, we are not calculating this cost here and will address 
it as part of future research. Nevertheless, it is important 
to remember that the increased cost of managing and stor- 
ing the Chain Routing data structure is what enables to 
keep more topological information of the network. 

6. Beyond Topology: Complete Orders in Time 

We recognize that any routing protocol not only needs 
to find the best paths to reach a destination, but also to 
adapt to sudden changes in topology. Therefore, there 
is a necessity to consider time as another important con- 
straint in the behavior of a routing protocol. Subsection 
|6.1| analyzes the interactions between events that follow a 
temporal order or a topological order, and demonstrates 
that when either of these orders fail to exist instabilities 



may arise in the network. Then, subsection 6.2 proposes to 
use different timescales to allow Chain Routing to mantain 
the stability of the network. 

6.1. Temporal order vs. topological order 

The following example demonstrates what could happen 
when there is no clear order in the time realm. 

Example 5. In the network depicted in Fig. [6| AS a suf- 
fers a failure at t = 0, but recovers from it at t = 2. AS 
b has a fast link to a and d, therefore b quickly reports a 's 
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a fails at t=0 



a recovers at t=2 



b reports failure at t=1 
c reports failure at t=4 

b reports recovery at t=3 
c reports recovery at t=5 



Figure 6: Network with no termporal order 
f 




e fails at t=0 



f reports failure at t=1 
g reports failure at t=1 
h reports failure at t=4 



Figure 7: Network with no topological order 

failure and recovery. On the other hand, c cannot pass 
routing information to d as fast as b does. This causes a 
delay in the messages which corrupts the order at which 
they arrive at d. Therefore, d sees that a failed at t = I, 
recovered at t = 3, failed again at t = 4 and recovered at 
t = 5. This lack of order in time may also cause other 
problems in larger networks RV. 

In order to avoid the route flapping presented in the 
previous example, it is necessary to implement an order 
in time. A simple way to achieve this is by reporting not 
only the occurrence of an event, but also the time when 
it happened [6J 0. If the messages in Fig. [6] include the 
time at which the failure (t = 0) and the recovery (t = 2) 
happened, when AS d receives the delayed failure message 
from c at t = 4, d would be able to determine that this 
is an event that happened before the recovery message 
announced by b at t = 3. 

In contrast, the following example shows a network in 
which there is no clear order in topology. 

Example 6. In the network shown in Fig. [?J AS e fails 
at t = 0, and ASs g and f acknowledge this failure at 
t = 1, but h takes more time to detect that e has become 
unavailable; this causes g and d to insist that they can still 
route data through h. It is not until t — 4 that h reports 
that it cannot reach e, and then g and d stop trying to send 
data through h. 

The instability shown in the previous example could 
have been avoided if an order in topology is established be- 



fore the failure occurrs. For example, if chain C(d, h, g, e) 
is defined, when e fails and g reports the failure at t = 1, 
because h is lower than g in its Hasse diagram, the vertex 
order will forbid g to use h to reach e, thus the problem 
is avoided. On the other hand, if the chain C(d, g, h, e) is 
defined, Rule [2] (see 5.1.1) will force AS d to use its direct 
link Varc(df + fe) to reach e, but because / also reported 
the failure at t = 1, d will be aware that e has become 
unavailable. 

In Fig. [7j if we route to e using C(d, h, g,e), h has two 
paths to the destination (h — > e and h — > g — > e), while g 
has only one (g — > e). Conversely, if we use C(d,g,h,e), 
h has only one path to e and g has two. This means that 
vertices that are lower in the Hasse diagram of a chain have 
more paths and better connectivity to the destination than 
those that are higher. 

The previous two examples demonstrate that by com- 
bining orders in time and topology it is possible to main- 
tain network stability, but when either of these two orders 
is absent, instablities will occur in the network due to in- 
accurate information. 



6.2. Chain Routing timescales 

As it was asserted in Section |5.2[ Chain Routing will 
need at least three rounds of messages to establish a chain. 
However, once a chain has been defined all ASs will be 
communicated to every other AS via a chain segment. 
Therefore, we will define two timescales for the Chain 
Routing framework: the long-term scale will be similar to 
the period of time needed to define a chain, while the im- 
mediate timescale will be comparable to the time needed 
to transmit a message between two ASs in a chain. These 
timescales would help to support a stable and reliable con- 
nectivity where only necessary changes are sudden. Hence, 
at the immediate timescale, an AS should use a selected 
chain to a destination for as long as there is a viable path in 
that structure, and only if no path is available, the source 
will switch to a different chain. Meanwhile, at the long- 
term scale, the source should constantly be monitoring for 
a longer or more stable chain to reach the same destina- 
tion, and only when all the involved ASs have reached an 
agreement, the source will be free to switch and use the 
new chain. 

It is also important to consider that, just like BGP be- 
comes unstable when it cannot find a suitable set of paths 
to reach a destination, it is also possible that Chain Rout- 
ing cannot find a suitable set of chains in a network. This 
may produce oscillations between competing alternative 
chains. Fortunately, this is a different type of problem in 
which it is possible to stop the oscillations without affect- 
ing the traffic because every chain is a robust solution. 
Still, it is necessary to develop mechanisms that eliminate 
the possibility of Chain Routing becoming unstable be- 
cause it cannot find a definitive chain to reach a destina- 
tion. 
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7. Discussion 

As a routing framework, Chain Routing offers many ad- 
vantages over BGP's current implementation. The main 
one is that failure of a link or an AS does not require the 
protocol to reconverge, because each AS knows beforehand 
alternative paths to reach a destination. This is the result 
of Chain Routing's topological unit, the complete order, 
and its two timescales, immediate and long-term, which 
allow it to define more than one arc-disjoint path to a des- 
tination. 

Other advantages of employing the Chain Routing 
framework are that: 

1. it allows easy implementation of traffic engineering. 
This is because, once a chain has been established, it 
is trivial to identify the other stable paths and to use 
them concurrently. 

2. failures will not cause transient data loops in the net- 
work. 

3. it is easier to identify and prevent oscillatory behav- 
iors (PRO). 

Conversely, Chain Routing will produce some challenges 
that need to be addressed before a final implementation is 
proposed. A problem that derives from the increased com- 
plexity of Chain Routing is that route aggregation may be 
difficult once chains to destinations have been established. 
This happens because a new set of IP addresses would re- 
quire further network coordination which may result in an 
intermediate AS rejecting the modifications and impacting 
a previously defined chain. 

Also, since vertices lower in the Hasse diagram of a 
chain will enjoy more connectivity than those at the top, 
there might be disputes on the order that ASs will follow 
when defining a chain. This means that, in some cases, 
human intervention and negotiation may be required to 
form chains. It also means that the traditional customer- 
provider model [10] might need to be reconsidered and 
perhaps superseeded by another economic model which ac- 
commodates for chains. 

Another factor that has not been addressed by this re- 
search is the potential interactions between interior gate- 
way protocols and Chain Routing. It has been previously 
demonstrated that such interactions are sometiems prob- 
lematic for BGP [5]. Therefore, it makes sense to develop 
a solution that allows safe and stable interactions between 
different types of routing protocols. 

8. Conclusion 

In this paper we have proposed the development of a 
routing framework whose main topological unit is the com- 
plete order: Chain Routing. Such framework allows to 
exploit the Internet's unused path diversity while, at the 
same time, maintains the stability of this network. The 
main advantages of using complete orders are that: it al- 
lows easy implementation of traffic engineering, enables 



the nodes in the chain to quickly find alternative paths 
when a failure occurs and it avoids the occurrence of tran- 
sient loops. Although Chain Routing is a more stable solu- 
tion than the current BGP implementation, it also requires 
more coordination between ASs. 

Still, this proposed framework is just a theoretical solu- 
tion that requires further empirical development and test- 
ing. This research has only laid down the foundations of 
a new routing scheme, and its final implementation was 
not included in the scope of this paper. There are also 
many characteristics that were not sufficiently addressed 
in here, such as the influence and implementation of poli- 
cies, the fast growth and shrinkage of chains, the influence 
of Chain Routing in the economic model of the network 
and the mechanisms which would allow route aggregation 
and scalability. Moreover, a large amount of experimenta- 
tion is needed before a practical implementation of Chain 
Routing is obtained, which includes finding an efficient al- 
gorithm for discovering complete orders in a digraph. 

The conditions under which a network maintains its sta- 
bility were also explored and it was determined that by ap- 
plying complete orders in two realms, temporal and topo- 
logical, it might be possible to obtain a highly stable rout- 
ing protocol which is more resilient to failures than BGP's 
current implementation. This lends support to the asser- 
tion that the topological complete order presented here, 
Chain Routing, has the potential to become a very stable 
solution for routing in the Internet. 

Finally, we believe it is possible that the application of 
Chain Routing could be extended to other systems which 
can be modelled as a digraph that needs to maintain its 
connectivity. Examples of such systems might be overlay 
networks, data center networks or even vehicular traffic 
distribution. 
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